Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 22
1.
medRxiv ; 2024 Feb 06.
Article En | MEDLINE | ID: mdl-38370766

INTRODUCTION: Alzheimer's Disease (AD) are often misclassified in electronic health records (EHRs) when relying solely on diagnostic codes. This study aims to develop a more accurate, computable phenotype (CP) for identifying AD patients by using both structured and unstructured EHR data. METHODS: We used EHRs from the University of Florida Health (UF Health) system and created rule-based CPs iteratively through manual chart reviews. The CPs were then validated using data from the University of Texas Health Science Center at Houston (UT Health) and the University of Minnesota (UMN). RESULTS: Our best-performing CP is " patient has at least 2 AD diagnoses and AD-related keywords " with an F1-score of 0.817 at UF, and 0.961 and 0.623 at UT Health and UMN, respectively. DISCUSSION: We developed and validated rule-based CPs for AD identification with good performance, crucial for studies that aim to use real-world data like EHRs.

2.
Ann Intern Med ; 177(2): 165-176, 2024 02.
Article En | MEDLINE | ID: mdl-38190711

BACKGROUND: The efficacy of the BNT162b2 vaccine in pediatrics was assessed by randomized trials before the Omicron variant's emergence. The long-term durability of vaccine protection in this population during the Omicron period remains limited. OBJECTIVE: To assess the effectiveness of BNT162b2 in preventing infection and severe diseases with various strains of the SARS-CoV-2 virus in previously uninfected children and adolescents. DESIGN: Comparative effectiveness research accounting for underreported vaccination in 3 study cohorts: adolescents (12 to 20 years) during the Delta phase and children (5 to 11 years) and adolescents (12 to 20 years) during the Omicron phase. SETTING: A national collaboration of pediatric health systems (PEDSnet). PARTICIPANTS: 77 392 adolescents (45 007 vaccinated) during the Delta phase and 111 539 children (50 398 vaccinated) and 56 080 adolescents (21 180 vaccinated) during the Omicron phase. INTERVENTION: First dose of the BNT162b2 vaccine versus no receipt of COVID-19 vaccine. MEASUREMENTS: Outcomes of interest include documented infection, COVID-19 illness severity, admission to an intensive care unit (ICU), and cardiac complications. The effectiveness was reported as (1-relative risk)*100, with confounders balanced via propensity score stratification. RESULTS: During the Delta period, the estimated effectiveness of the BNT162b2 vaccine was 98.4% (95% CI, 98.1% to 98.7%) against documented infection among adolescents, with no statistically significant waning after receipt of the first dose. An analysis of cardiac complications did not suggest a statistically significant difference between vaccinated and unvaccinated groups. During the Omicron period, the effectiveness against documented infection among children was estimated to be 74.3% (CI, 72.2% to 76.2%). Higher levels of effectiveness were seen against moderate or severe COVID-19 (75.5% [CI, 69.0% to 81.0%]) and ICU admission with COVID-19 (84.9% [CI, 64.8% to 93.5%]). Among adolescents, the effectiveness against documented Omicron infection was 85.5% (CI, 83.8% to 87.1%), with 84.8% (CI, 77.3% to 89.9%) against moderate or severe COVID-19, and 91.5% (CI, 69.5% to 97.6%) against ICU admission with COVID-19. The effectiveness of the BNT162b2 vaccine against the Omicron variant declined 4 months after the first dose and then stabilized. The analysis showed a lower risk for cardiac complications in the vaccinated group during the Omicron variant period. LIMITATION: Observational study design and potentially undocumented infection. CONCLUSION: This study suggests that BNT162b2 was effective for various COVID-19-related outcomes in children and adolescents during the Delta and Omicron periods, and there is some evidence of waning effectiveness over time. PRIMARY FUNDING SOURCE: National Institutes of Health.


BNT162 Vaccine , COVID-19 , United States , Humans , Adolescent , Child , COVID-19 Vaccines , COVID-19/prevention & control , Comparative Effectiveness Research , Hospitalization
3.
Neuro Oncol ; 2023 Dec 23.
Article En | MEDLINE | ID: mdl-38141226

BACKGROUND: Glioblastoma (GBM) is the most common malignant brain tumor, and thus it is important to be able to identify patients with this diagnosis for population studies. However, this can be challenging as diagnostic codes are non-specific. The aim of this study was to create a computable phenotype (CP) for GBM from structured and unstructured data to identify patients with this condition in a large electronic health record (EHR). METHODS: We used the UF Health Integrated Data Repository, a centralized clinical data warehouse that stores clinical and research data from various sources within the UF Health system, including the EHR system. We performed multiple iterations to refine the GBM-relevant diagnosis codes, procedure codes, medication codes, and keywords through manual chart review of patient data. We then evaluated the performances of various possible proposed CPs constructed from the relevant codes and keywords. RESULTS: We underwent six rounds of manual chart reviews to refine the CP elements. The final CP algorithm for identifying GBM patients was selected based on the best F1-score. Overall, the CP rule "if the patient had at least 1 relevant diagnosis code and at least 1 relevant keyword" demonstrated the highest F1-score using both structured and unstructured data. Thus, it was selected as the best-performing CP rule. CONCLUSIONS: We developed a CP algorithm for identifying patients with GBM using both structured and unstructured EHR data from a large tertiary care center. The final algorithm achieved an F1-score of 0.817, indicating a high performance which minimizes possible biases from misclassification errors.

4.
medRxiv ; 2023 Nov 13.
Article En | MEDLINE | ID: mdl-38014095

Background: The efficacy of the BNT162b2 vaccine in pediatrics was assessed by randomized trials before the Omicron variant's emergence. The long-term durability of vaccine protection in this population during the Omicron period remains limited. Objective: To assess the effectiveness of BNT162b2 in preventing infection and severe diseases with various strains of the SARS-CoV-2 virus in previously uninfected children and adolescents. Design: Comparative effectiveness research accounting for underreported vaccination in three study cohorts: adolescents (12 to 20 years) during the Delta phase, children (5 to 11 years) and adolescents (12 to 20 years) during the Omicron phase. Setting: A national collaboration of pediatric health systems (PEDSnet). Participants: 77,392 adolescents (45,007 vaccinated) in the Delta phase, 111,539 children (50,398 vaccinated) and 56,080 adolescents (21,180 vaccinated) in the Omicron period. Exposures: First dose of the BNT162b2 vaccine vs. no receipt of COVID-19 vaccine. Measurements: Outcomes of interest include documented infection, COVID-19 illness severity, admission to an intensive care unit (ICU), and cardiac complications. The effectiveness was reported as (1-relative risk)*100% with confounders balanced via propensity score stratification. Results: During the Delta period, the estimated effectiveness of BNT162b2 vaccine was 98.4% (95% CI, 98.1 to 98.7) against documented infection among adolescents, with no significant waning after receipt of the first dose. An analysis of cardiac complications did not find an increased risk after vaccination. During the Omicron period, the effectiveness against documented infection among children was estimated to be 74.3% (95% CI, 72.2 to 76.2). Higher levels of effectiveness were observed against moderate or severe COVID-19 (75.5%, 95% CI, 69.0 to 81.0) and ICU admission with COVID-19 (84.9%, 95% CI, 64.8 to 93.5). Among adolescents, the effectiveness against documented Omicron infection was 85.5% (95% CI, 83.8 to 87.1), with 84.8% (95% CI, 77.3 to 89.9) against moderate or severe COVID-19, and 91.5% (95% CI, 69.5 to 97.6)) against ICU admission with COVID-19. The effectiveness of the BNT162b2 vaccine against the Omicron variant declined after 4 months following the first dose and then stabilized. The analysis revealed a lower risk of cardiac complications in the vaccinated group during the Omicron variant period. Limitations: Observational study design and potentially undocumented infection. Conclusions: Our study suggests that BNT162b2 was effective for various COVID-19-related outcomes in children and adolescents during the Delta and Omicron periods, and there is some evidence of waning effectiveness over time. Primary Funding Source: National Institutes of Health.

5.
J Am Med Inform Assoc ; 31(1): 165-173, 2023 12 22.
Article En | MEDLINE | ID: mdl-37812771

OBJECTIVE: Having sufficient population coverage from the electronic health records (EHRs)-connected health system is essential for building a comprehensive EHR-based diabetes surveillance system. This study aimed to establish an EHR-based type 1 diabetes (T1D) surveillance system for children and adolescents across racial and ethnic groups by identifying the minimum population coverage from EHR-connected health systems to accurately estimate T1D prevalence. MATERIALS AND METHODS: We conducted a retrospective, cross-sectional analysis involving children and adolescents <20 years old identified from the OneFlorida+ Clinical Research Network (2018-2020). T1D cases were identified using a previously validated computable phenotyping algorithm. The T1D prevalence for each ZIP Code Tabulation Area (ZCTA, 5 digits), defined as the number of T1D cases divided by the total number of residents in the corresponding ZCTA, was calculated. Population coverage for each ZCTA was measured using observed health system penetration rates (HSPR), which was calculated as the ratio of residents in the corresponding ZTCA and captured by OneFlorida+ to the overall population in the same ZCTA reported by the Census. We used a recursive partitioning algorithm to identify the minimum required observed HSPR to estimate T1D prevalence and compare our estimate with the reported T1D prevalence from the SEARCH study. RESULTS: Observed HSPRs of 55%, 55%, and 60% were identified as the minimum thresholds for the non-Hispanic White, non-Hispanic Black, and Hispanic populations. The estimated T1D prevalence for non-Hispanic White and non-Hispanic Black were 2.87 and 2.29 per 1000 youth, which are comparable to the reference study's estimation. The estimated prevalence of T1D for Hispanics (2.76 per 1000 youth) was higher than the reference study's estimation (1.48-1.64 per 1000 youth). The standardized T1D prevalence in the overall Florida population was 2.81 per 1000 youth in 2019. CONCLUSION: Our study provides a method to estimate T1D prevalence in children and adolescents using EHRs and reports the estimated HSPRs and prevalence of T1D for different race and ethnicity groups to facilitate EHR-based diabetes surveillance.


Diabetes Mellitus, Type 1 , Child , Humans , Adolescent , Young Adult , Adult , Diabetes Mellitus, Type 1/epidemiology , Prevalence , Electronic Health Records , Cross-Sectional Studies , Retrospective Studies
6.
Alzheimers Dement ; 19(8): 3506-3518, 2023 08.
Article En | MEDLINE | ID: mdl-36815661

INTRODUCTION: This study aims to explore machine learning (ML) methods for early prediction of Alzheimer's disease (AD) and related dementias (ADRD) using the real-world electronic health records (EHRs). METHODS: A total of 23,835 ADRD and 1,038,643 control patients were identified from the OneFlorida+ Research Consortium. Two ML methods were used to develop the prediction models. Both knowledge-driven and data-driven approaches were explored. Four computable phenotyping algorithms were tested. RESULTS: The gradient boosting tree (GBT) models trained with the data-driven approach achieved the best area under the curve (AUC) scores of 0.939, 0.906, 0.884, and 0.854 for early prediction of ADRD 0, 1, 3, or 5 years before diagnosis, respectively. A number of important clinical and sociodemographic factors were identified. DISCUSSION: We tested various settings and showed the predictive ability of using ML approaches for early prediction of ADRD with EHRs. The models can help identify high-risk individuals for early informed preventive or prognostic clinical decisions.


Alzheimer Disease , Humans , Alzheimer Disease/diagnosis , Alzheimer Disease/epidemiology , Electronic Health Records , Prognosis , Machine Learning , Algorithms
7.
IEEE Int Conf Healthc Inform ; 2022: 618-619, 2022 Jun.
Article En | MEDLINE | ID: mdl-36168559

This study aims to develop a natural language processing (NLP) tool to extract the pulmonary nodules and nodule characteristics information from free-text clinical narratives. We identified a cohort of 3,080 patients who received low dose computed tomography (LDCT) at the University of Florida health system and collected their clinical narratives including radiology reports in their electronic health records (EHRs). Then, we manually annotated 394 reports as the gold-standard corpus and explored three state-of-the-art transformer-based NLP methods. The best model achieved an F1-score of 0.9279.

8.
Int J Med Inform ; 165: 104834, 2022 09.
Article En | MEDLINE | ID: mdl-35863206

OBJECTIVE: We summarized a decade of new research focusing on semantic data integration (SDI) since 2009, and we aim to: (1) summarize the state-of-art approaches on integrating health data and information; and (2) identify the main gaps and challenges of integrating health data and information from multiple levels and domains. MATERIALS AND METHODS: We used PubMed as our focus is applications of SDI in biomedical domains and followed the Preferred Reporting Items for Systematic Review and Meta-Analyses (PRISMA) to search and report for relevant studies published between January 1, 2009 and December 31, 2021. We used Covidence-a systematic review management system-to carry out this scoping review. RESULTS: The initial search from PubMed resulted in 5,326 articles using the two sets of keywords. We then removed 44 duplicates and 5,282 articles were retained for abstract screening. After abstract screening, we included 246 articles for full-text screening, among which 87 articles were deemed eligible for full-text extraction. We summarized the 87 articles from four aspects: (1) methods for the global schema; (2) data integration strategies (i.e., federated system vs. data warehousing); (3) the sources of the data; and (4) downstream applications. CONCLUSION: SDI approach can effectively resolve the semantic heterogeneities across different data sources. We identified two key gaps and challenges in existing SDI studies that (1) many of the existing SDI studies used data from only single-level data sources (e.g., integrating individual-level patient records from different hospital systems), and (2) documentation of the data integration processes is sparse, threatening the reproducibility of SDI studies.


Information Storage and Retrieval , Semantics , Humans , Mass Screening , Reproducibility of Results
9.
J Affect Disord ; 308: 587-595, 2022 07 01.
Article En | MEDLINE | ID: mdl-35427717

BACKGROUND: Limited evidence to show the longitudinal associations between maternal dietary patterns and antenatal depression (AD) from cohort studies across the entire gestation period. METHODS: Data came from the Chinese Pregnant Women Cohort Study. The qualitative food frequency questionnaire (Q-FFQ) and Edinburgh Postnatal Depression Scale (EPDS) were used to collect diet and depression data. Dietary patterns were derived by using factor analysis. Generalized estimating equation models were used to analyze the association between diet and AD. RESULTS: A total of 4139 participants finishing 3-wave of follow-up were finally included. Four constant diets were identified, namely plant-based, animal-protein, vitamin-rich and oily-fatty patterns. The prevalence of depression was 23.89%, 21.12% and 22.42% for the first, second and third trimesters. There were reverse associations of plant-based pattern (OR:0.85, 95%CI:0.75-0.97), animal-protein pattern (OR:0.85, 95%CI:0.74-0.99) and vitamin-rich pattern (OR:0.58, 95%CI:0.50-0.67) with AD, while a positive association between oily-fatty pattern and AD (OR:1.47, 95%CI:1.29-1.68). Except for the plant-based pattern, other patterns had linear trend relationships with AD (Ptrend < 0.05). Moreover, a 1-SD increase in vitamin-rich pattern scores was associated with a 20% lower AD risk (OR:0.80, 95%CI:0.76-0.84), while a 1-SD increase in oily-fatty pattern scores was associated with a 19% higher risk (OR:1.19, 95%CI:1.13-1.24). Interactions between dietary patterns and lifestyle habits were observed. LIMITATIONS: The self-reported Q-FFQ and EPDS may cause recall bias. CONCLUSIONS: There are longitudinal associations between maternal dietary patterns and antenatal depression. Our findings are expected to provide evidence for a dietary therapy strategy to improve or prevent depression during pregnancy.


Depression , Pregnant Women , Animals , China/epidemiology , Cohort Studies , Depression/epidemiology , Diet , Female , Humans , Pregnancy , Vitamins
10.
Sci Total Environ ; 806(Pt 1): 150352, 2022 Feb 01.
Article En | MEDLINE | ID: mdl-34555607

BACKGROUND: The effects of weather periods, race/ethnicity, and sex on environmental triggers for respiratory exacerbations are not well understood. This study linked the OneFlorida network (~15 million patients) with an external exposome database to analyze environmental triggers for asthma, bronchitis, and COPD exacerbations while accounting for seasonality, sex, and race/ethnicity. METHODS: This is a case-crossover study of OneFlorida database from 2012 to 2017 examining associations of asthma, bronchitis, and COPD exacerbations with exposures to heat index, PM 2.5 and O 3. We spatiotemporally linked exposures using patients' residential addresses to generate average exposures during hazard and control periods, with each case serving as its own control. We considered age, sex, race/ethnicity, and neighborhood deprivation index as potential effect modifiers in conditional logistic regression models. RESULTS: A total of 1,148,506 exacerbations among 533,446 patients were included. Across all three conditions, hotter heat indices conferred increasing exacerbation odds, except during November to March, where the opposite was seen. There were significant differences when stratified by race/ethnicity (e.g., for asthma in April, May, and October, heat index quartile 4, odds were 1.49 (95% confidence interval (CI) 1.42-1.57) for Non-Hispanic Blacks and 2.04 (95% CI 1.92-2.17) for Hispanics compared to 1.27 (95% CI 1.19-1.36) for Non-Hispanic Whites). Pediatric patients' odds of asthma and bronchitis exacerbations were significantly lower than adults in certain circumstances (e.g., for asthma during June - September, pediatric odds 0.71 (95% CI 0.68-0.74) and adult odds 0.82 (95% CI 0.79-0.85) for the highest quartile of PM 2.5). CONCLUSION: This study of acute exacerbations of asthma, bronchitis, and COPD found exacerbation risk after exposure to heat index, PM 2.5 and O 3 varies by weather period, age, and race/ethnicity. Future work can build upon these results to alert vulnerable populations to exacerbation triggers.


Asthma , Pulmonary Disease, Chronic Obstructive , Respiration Disorders , Adult , Asthma/epidemiology , Big Data , Child , Cross-Over Studies , Humans
11.
J Am Med Inform Assoc ; 28(9): 2050-2067, 2021 08 13.
Article En | MEDLINE | ID: mdl-34151987

OBJECTIVE: To summarize how artificial intelligence (AI) is being applied in COVID-19 research and determine whether these AI applications integrated heterogenous data from different sources for modeling. MATERIALS AND METHODS: We searched 2 major COVID-19 literature databases, the National Institutes of Health's LitCovid and the World Health Organization's COVID-19 database on March 9, 2021. Following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guideline, 2 reviewers independently reviewed all the articles in 2 rounds of screening. RESULTS: In the 794 studies included in the final qualitative analysis, we identified 7 key COVID-19 research areas in which AI was applied, including disease forecasting, medical imaging-based diagnosis and prognosis, early detection and prognosis (non-imaging), drug repurposing and early drug discovery, social media data analysis, genomic, transcriptomic, and proteomic data analysis, and other COVID-19 research topics. We also found that there was a lack of heterogenous data integration in these AI applications. DISCUSSION: Risk factors relevant to COVID-19 outcomes exist in heterogeneous data sources, including electronic health records, surveillance systems, sociodemographic datasets, and many more. However, most AI applications in COVID-19 research adopted a single-sourced approach that could omit important risk factors and thus lead to biased algorithms. Integrating heterogeneous data for modeling will help realize the full potential of AI algorithms, improve precision, and reduce bias. CONCLUSION: There is a lack of data integration in the AI applications in COVID-19 research and a need for a multilevel AI framework that supports the analysis of heterogeneous data from different sources.


Artificial Intelligence , Biomedical Research/trends , COVID-19 , Algorithms , Databases as Topic , Humans , National Institutes of Health (U.S.) , Proteomics , United States , World Health Organization
12.
Crit Care Explor ; 3(6): e0456, 2021 Jun.
Article En | MEDLINE | ID: mdl-34136827

To determine if early CNS symptoms are associated with severe coronavirus disease 2019. DESIGN: A retrospective, observational case series study design. SETTING: Electronic health records were reviewed for patients from five healthcare systems across the state of Florida, United States. PATIENTS: A clinical sample (n = 36,615) of patients with confirmed diagnosis of coronavirus disease 2019 were included. Twelve percent (n = 4,417) of the sample developed severe coronavirus disease 2019, defined as requiring critical care, mechanical ventilation, or diagnosis of acute respiratory distress syndrome, sepsis, or severe inflammatory response syndrome. INTERVENTIONS: None. MEASUREMENT AND MAIN RESULTS: We reviewed the electronic health record for diagnosis of early CNS symptoms (encephalopathy, headache, ageusia, anosmia, dizziness, acute cerebrovascular disease) between 14 days before the diagnosis of coronavirus disease 2019 and 8 days after the diagnosis of coronavirus disease 2019, or before the date of severe coronavirus disease 2019 diagnosis, whichever came first. Hierarchal logistic regression models were used to examine the odds of developing severe coronavirus disease 2019 based on diagnosis of early CNS symptoms. Severe coronavirus disease 2019 patients were significantly more likely to have early CNS symptoms (32.8%) compared with nonsevere patients (6.11%; χ2[1] = 3,266.08, p < 0.0001, φ = 0.29). After adjusting for demographic variables and pertinent comorbidities, early CNS symptoms were significantly associated with severe coronavirus disease 2019 (odds ratio = 3.21). Diagnosis of encephalopathy (odds ratio = 14.38) was associated with greater odds of severe coronavirus disease 2019; whereas diagnosis of anosmia (odds ratio = 0.45), ageusia (odds ratio = 0.46), and headache (odds ratio = 0.63) were associated with reduced odds of severe coronavirus disease 2019. CONCLUSIONS: Early CNS symptoms, and specifically encephalopathy, are differentially associated with risk of severe coronavirus disease 2019 and may serve as an early marker for differences in clinical disease course. Therapies for early coronavirus disease 2019 are scarce, and further identification of subgroups at risk may help to advance understanding of the severity trajectories and enable focused treatment.

13.
BMJ Open ; 11(3): e044933, 2021 03 23.
Article En | MEDLINE | ID: mdl-33757952

PURPOSE: A multicentre prospective cohort study, known as the Chinese Pregnant Women Cohort Study (CPWCS), was established in 2017 to collect exposure data during pregnancy (except environmental exposure) and analyse the relationship between lifestyle during pregnancy and obstetric outcomes. Data about mothers and their children's life and health as well as children's laboratory testing will be collected during the offspring follow-up of CPWCS, which will enable us to further investigate the longitudinal relationship between exposure in different periods (during pregnancy and childhood) and children's development. PARTICIPANTS: 9193 pregnant women in 24 hospitals in China who were in their first trimester (5-13 weeks gestational age) from 25 July 2017 to 26 November 2018 were included in CPWCS by convenience sampling. Five hospitals in China which participated in CPWCS with good cooperation will be selected as the sample source for the Chinese Pregnant Women Cohort Study (Offspring Follow-up) (CPWCS-OF). FINDINGS TO DATE: Some factors affecting pregnancy outcomes and health problems during pregnancy have been discovered through data analysis. The details are discussed in the 'Findings to date' section. FUTURE PLANS: Infants and children and their mothers who meet the criteria will be enrolled in the study and will be followed up every 2 years. The longitudinal relationship between exposure (questionnaire data, physical examination and biospecimens, medical records, and objective environmental data collected through geographical information system and remote sensing technology) in different periods (during pregnancy and childhood) and children's health (such as sleeping problem, oral health, bowel health and allergy-related health problems) will be analysed. TRAIL REGISTRATION NUMBER: CPWCS was registered with ClinicalTrials.gov on 18 January 2018: NCT03403543. CPWCS-OF was registered with ClinicalTrials.gov on 24 June 2020: NCT04444791.


Pregnant Women , Child , China/epidemiology , Cohort Studies , Female , Follow-Up Studies , Humans , Infant , Pregnancy , Prospective Studies
14.
ICMHI 2021 (2021) ; 2021: 296-303, 2021 May.
Article En | MEDLINE | ID: mdl-37954527

Causal artificial intelligence aims at developing bias-robust models that can be used to intervene on, rather than just be predictive, of risks or outcomes. However, learning interventional models from observational data, including electronic health records (EHR), is challenging due to inherent bias, e.g., protopathic, confounding, collider. When estimating the effects of treatment interventions, classical approaches like propensity score matching are often used, but they pose limitations with large feature sets, nonlinear/nonparallel treatment group assignments, and collider bias. In this work, we used data from a large EHR consortium -OneFlorida- and evaluated causal statistical/machine learning methods for determining the effect of statin treatment on the risk of Alzheimer's disease, a debated clinical research question. We introduced a combination of directed acyclic graph (DAG) learning and comparison with expert's design, with calculation of the generalized adjustment criterion (GAC), to find an optimal set of covariates for estimation of treatment effects -ameliorating collider bias. The DAG/CAC approach was assessed together with traditional propensity score matching, inverse probability weighting, virtual-twin/counterfactual random forests, and deep counterfactual networks. We showed large heterogeneity in effect estimates upon different model configurations. Our results did not exclude a protective effect of statins, where the DAG/GAC point estimate aligned with the maximum credibility estimate, although the 95% credibility interval included a null effect, warranting further studies and replication.

15.
medRxiv ; 2020 Nov 05.
Article En | MEDLINE | ID: mdl-33173920

This study presents a natural language processing (NLP) tool to extract quantitative smoking information (e.g., Pack-Year, Quit Year, Smoking Year, and Pack per Day) from clinical notes and standardized them into Pack-Year unit. We annotated a corpus of 200 clinical notes from patients who had low-dose CT imaging procedures for lung cancer screening and developed an NLP system using a two-layer rule-engine structure. We divided the 200 notes into a training set and a test set and developed the NLP system only using the training set. The experimental results on the test set showed that our NLP system achieved the best F1 scores of 0.963 and 0.946 for lenient and strict evaluation, respectively. NOTE: Accepted as a presentation at the 2020 IEEE International Conference on Healthcare Informatics (ICHI) Workshop on Health Natural Language Processing (HealthNLP 2020). https://ohnlp.github.io/HealthNLP2020/healthnlp2020# .

16.
J Am Med Inform Assoc ; 27(12): 1999-2010, 2020 12 09.
Article En | MEDLINE | ID: mdl-33166397

OBJECTIVE: To synthesize data quality (DQ) dimensions and assessment methods of real-world data, especially electronic health records, through a systematic scoping review and to assess the practice of DQ assessment in the national Patient-centered Clinical Research Network (PCORnet). MATERIALS AND METHODS: We started with 3 widely cited DQ literature-2 reviews from Chan et al (2010) and Weiskopf et al (2013a) and 1 DQ framework from Kahn et al (2016)-and expanded our review systematically to cover relevant articles published up to February 2020. We extracted DQ dimensions and assessment methods from these studies, mapped their relationships, and organized a synthesized summarization of existing DQ dimensions and assessment methods. We reviewed the data checks employed by the PCORnet and mapped them to the synthesized DQ dimensions and methods. RESULTS: We analyzed a total of 3 reviews, 20 DQ frameworks, and 226 DQ studies and extracted 14 DQ dimensions and 10 assessment methods. We found that completeness, concordance, and correctness/accuracy were commonly assessed. Element presence, validity check, and conformance were commonly used DQ assessment methods and were the main focuses of the PCORnet data checks. DISCUSSION: Definitions of DQ dimensions and methods were not consistent in the literature, and the DQ assessment practice was not evenly distributed (eg, usability and ease-of-use were rarely discussed). Challenges in DQ assessments, given the complex and heterogeneous nature of real-world data, exist. CONCLUSION: The practice of DQ assessment is still limited in scope. Future work is warranted to generate understandable, executable, and reusable DQ measures.


Biomedical Research , Data Accuracy , Electronic Health Records/standards , Humans , Information Systems
17.
Article En | MEDLINE | ID: mdl-33786419

This study presents a natural language processing (NLP) tool to extract quantitative smoking information (e.g., Pack-Year, Quit Year, Smoking Year, and Pack per Day) from clinical notes and standardized them into Pack-Year unit. We annotated a corpus of 200 clinical notes from patients who had low-dose CT imaging procedures for lung cancer screening and developed an NLP system using a two-layer rule-engine structure. We divided the 200 notes into a training set and a test set and developed the NLP system only using the training set. The experimental results on the test set showed that our NLP system achieved the best F1 scores of 0.963 and 0.946 for lenient and strict evaluation, respectively.

18.
AMIA Annu Symp Proc ; 2020: 393-401, 2020.
Article En | MEDLINE | ID: mdl-33936412

With vast amounts ofpatients' medical information, electronic health records (EHRs) are becoming one of the most important data sources in biomedical and health care research. Effectively integrating data from multiple clinical sites can help provide more generalized real-world evidence that is clinically meaningful. To analyze the clinical data from multiple sites, distributed algorithms are developed to protect patient privacy without sharing individual-level medical information. In this paper, we applied the One-shot Distributed Algorithm for Cox proportional hazard model (ODAC) to the longitudinal data from the OneFlorida Clinical Research Consortium to demonstrate the feasibility of implementing the distributed algorithms in large research networks. We studied the associations between the clinical risk factors and Alzheimer's disease and related dementia (ADRD) onsets to advance clinical research on our understanding of the complex risk factors of ADRD and ultimately improve the care of ADRD patients.


Algorithms , Alzheimer Disease , Dementia , Electronic Health Records , Humans , Proportional Hazards Models , Risk Factors
19.
AMIA Annu Symp Proc ; 2020: 514-523, 2020.
Article En | MEDLINE | ID: mdl-33936425

Transgender and gender nonconforming (TGNC) individuals face significant marginalization, stigma, and discrimination. Under-reporting of TGNC individuals is common since they are often unwilling to self-identify. Meanwhile, the rapid adoption of electronic health record (EHR) systems has made large-scale, longitudinal real-world clinical data available to research and provided a unique opportunity to identify TGNC individuals using their EHRs, contributing to a promising routine health surveillance approach. Built upon existing work, we developed and validated a computable phenotype (CP) algorithm for identifying TGNC individuals and their natal sex (i.e., male-to-female or female-to-male) using both structured EHR data and unstructured clinical notes. Our CP algorithm achieved a 0.955 F1-score on the training data and a perfect F1-score on the independent testing data. Consistent with the literature, we observed an increasing percentage of TGNC individuals and a disproportionate burden of adverse health outcomes, especially sexually transmitted infections and mental health distress, in this population.


Algorithms , Decision Support Techniques , Electronic Health Records , Gender Identity , Sexual and Gender Minorities/psychology , Transgender Persons/psychology , Adolescent , Adult , Aged , Aged, 80 and over , Child , Child, Preschool , Female , Hormone Replacement Therapy/methods , Humans , Infant , Male , Middle Aged , Phenotype , Reproducibility of Results , Sex Reassignment Procedures , Young Adult
20.
AMIA Annu Symp Proc ; 2020: 1220-1229, 2020.
Article En | MEDLINE | ID: mdl-33936498

Because they contain detailed individual-level data on various patient characteristics including their medical conditions and treatment histories, electronic health record (EHR) systems have been widely adopted as an efficient source for health research. Compared to data from a single health system, real-world data (RWD) from multiple clinical sites provide a larger and more generalizable population for accurate estimation, leading to better decision making for health care. However, due to concerns over protecting patient privacy, it is challenging to share individual patient-level data across sites in practice. To tackle this issue, many distributed algorithms have been developed to transfer summary-level statistics to derive accurate estimates. Nevertheless, many of these algorithms require multiple rounds of communication to exchange intermediate results across different sites. Among them, the One-shot Distributed Algorithm for Logistic regression (termed ODAL) was developed to reduce communication overhead while protecting patient privacy. In this paper, we applied the ODAL algorithm to RWD from a large clinical data research network-the OneFlorida Clinical Research Consortium and estimated the associations between risk factors and the diagnosis of opioid use disorder (OUD) among individuals who received at least one opioid prescription. The ODAL algorithm provided consistent findings of the associated risk factors and yielded better estimates than meta-analysis.


Algorithms , Data Mining/methods , Drug Prescriptions , Electronic Health Records , Opioid-Related Disorders , Computer Communication Networks , Confidentiality , Humans , Logistic Models , Opioid-Related Disorders/diagnosis , Prescription Drug Misuse , Risk Factors
...